Learning Personalized End-to-End Goal-Oriented Dialog

2018-12-11

本文是来自于AAAI 2019关于在任务型对话中引入个性化的文章，主要通过用户profile建模来实现个性化回答和语义消歧。

Introduction

当前端到端任务型对话系统的研究集中于单纯根据对话内容生成对应回复，并没有考虑到针对于不同个性的用户生成个性化的应答，单纯基于对话内容的对话模型（content-based model）有以下缺点：

the inability to adjust language style flexibly

the lack of a dynamic conversation policy based on the interlocutor’s profile

the incapability of handling ambiguities in user requests

Fig 1 显示传统的content-based model与本文提出的personalized model生成对话的区别：

content-based model生成的回复不能根据情境调整称谓和表达方式，相对单一
在推荐候选方案时，content-based model只能随机生成顺序，而personalized model可以根据用户个性来动态调整推荐策略
对话中出现的contact 这个词既可以被解释成 phone，也可以解释成 social media，这两者都是knowledge base里的slot属性，personalized model可以根据学习到的个性知识（例如年轻人更喜欢social media，而成年人更偏phone）来消除歧义。

Psychologists have proven that during a dialog humans tend to adapt to their interlocutor to facilitate understanding, which enhances conversational efficiency (Brown 1965; Brown 1987; Kroger and Wood 1992).

本文提出了Profile Model 和 Preference Model ：前者通过用户画像（user profile）的分布表征来学习个性化，并且使用global memory来存储相似画像用户的对话上下文，以选择合适的语言表达方式和推荐策略；后者通过建立 profile与knowledge base的关联来学习对歧义候选的偏好。这两个模型都是基于memory network，作者将二者结合成PERSONALIZED MEMN2N。

_主要介绍了闲聊对话系统中的个性化，同样值得借鉴_

End-to-End Memory Network

本文使用的memory network主要借鉴了Bordes, Boureau, and Weston (2017)的论文 _Learning end-to-end goal-oriented dialog_。MEMN2N 包含两个组件： context memory和next response prediction。

As the model conducts a conversation with the user, utterance (from the user) and response (from the model) are in turn appended to the memory. At any given time step t there are $c_{1}^{u},…,c_{t}^{u}$ user utterances and $c_{1}^{r},…,c_{t-1}^{r}$ model responses. The aim at time t is to retrieve the next response $c_{t}^{r}$.

Memory Representation

借鉴Dodge et al. (2015) _Evaluating prerequisite qualities for learning end-to-end dialog systems_，本文亦将utterance表示成bag-of-words：

其中，$\phi(.)$ 是将utterance转化成长度为V的词袋向量，V是词汇集合大小。（将字典的所有词按序排列，每句话对应一个长度为V的向量，向量每个位置对应一个词，如果该词在这个utterance中出现，则这个位置取1，否则取0）。A是 $d \times V$ 的矩阵，d是embedding 维度。

同时，为了添加说话者身份和每个utterance的顺序信息，作者拓展V的大小，新增1000个额外的time features以及（#u,#r）编码身份信息。最后的user utterance $c_{t}^{u}$ 被编码成 : $q=A\phi(c_{t}^{u})$，q是时刻t的initial query， A是相同的embedding matrix。

Memory Operation

Memory Operation主要是将当前时刻t的initial query q与 memory $m_{i}$ 做attention计算，再将attention output与query q相加得到新的query q，重复迭代N hops。

$$\alpha_{i}=softmax(q^{T}m_{i})$$
$$o=R\sum_{i}\alpha_{i}m_{i}$$
$$q_{2}=q+o$$

共迭代N次，每次迭代使用$q_{k}$。

令 $r_{i}=W\phi(y_{i})$，其中 $W\in R^{d \times V}$ 也是一个embedding matrix，y是agent response的集合，通过N hops得到的$q_{N+1}$ 与 $r_{i}$ 做内积运算，经过softmax得到predicted response distribution：

C是集合y的大小，也即总共有C个回复。

Personalized Dialog System

作者共提出了两个模型Profile Model和Preference Model：Profile Model使用显式的profile embedding和隐式的global memory共同建模说话者个性；Preference Model则是对说话者在KB entity上的偏好建模。

这两个模型本身是相互独立的，作者将二者融合成一个模型PERSONALIZED MEMN2N， Fig 2显示了其联合结构：

Notation

每个用户有一个预先定义好的profile，由n个属性组成 $\left \{ \left ( k_{i},v_{i} \right ) \right \}_{i=1}^{n}$， $k_{i},v_{i}$分别对应第i个属性的名称和值，例如 {(Gender,Male);(Age,Young);(Dietary,Non-vegetable)}. 第i个属性会被表示成one-hot vector $a_{i}\in R^{d_{i}}$，$d_{i}$ 表示对于第i个属性$k_{i}$，总共有$d_{i}$个可能的值。然后直接将所有的$a_{i}$拼接得到最终的profile embedding $\tilde{a}=Concat(a_{1},…,a_{n})\in R^{d^{(p)}}$，$d^{(p)}=\sum_{i}^{n}d_{i}$。

Profile Model

Profile Model是将profile信息融合到query中，包含两个模块：profile embedding , global memory。

Profile Embedding

In the MEMN2N, the query q plays a key role in both reading memory and choosing the response, while it contains no information about the user. We expect to add a personalized information term to q at each iteration
of the query.

首先对$\tilde{a}$做线性维度变换：$p=P\tilde{a}$, where $P\in R^{d\times d^{(p)}}$，得到的profile embedding p维度与memory network中的词袋向量一致。然后每一次MN迭代添加profile embedding p：

$$q_{i+1}=q_{i}+o_{i}+p\:\:(3)$$

同理，在选择response时也应该添加profile信息：
$$r_{i}^{*}=\sigma (p^{T}r_{i})\cdot r_{i}\:\:(4)$$

$\sigma$ 是sigmoid，使用$r_{i}^{*}$代替Eq 2 中的$r_{i}$。

Global Memory

Users with similar profiles may expect the same or a similar response for a certain request. Therefore, instead of using the profile directly, we also implicitly integrate personalized information of an interlocutor by utilizing the conversation history from similar users as a global memory. The definition of similarity varies with task domains. In this paper, we regard those with the same profile as similar users.

实际的计算方式与MEMN2N完全相同，只是memory里面存储的是相似用户的历史对话：

N hops之后得到最终的 $q^{(g)}$ ，然后将其与MN相加：$q^{+}=q_{N+1}+q_{N+1}^{(g)}$。

Preference Model

到目前为止，还未解决KB实体的歧义问题。

The ambiguity refers to the user preference when more than one valid entities are available for a specific request. We propose inferring such preference by taking the relation between user profile and knowledge base into account.

如Fig 1所示，KB的每一行为一个完整的item，每一列对应一个属性，$e_{i,j}$ 代表row i col j 的实体值。

Preference Model定义如下：给定user profile 和 K col的KB，先对用户偏好建模：

其中，$E\in R^{K\times d^{(p)}}, v\in R^{K}$。这里作者做了假设：

Note that we assume the bot cannot provide more than one option in a single response, so a candidate can only contains one entity at most.

如果response里含有KB实体，那么它被选择的概率应该受到用户偏好的影响。定义偏差项 $b=\beta(v,r,m)\in R^{C}$，对于第k个位置的$b_{k}$ （实际对应第k个候选回复），采取如下策略计算：

如果第k个候选回复不包含KB实体，$b_{k}=0$；
如果第k个候选回复包含一个KB实体 $e_{i,j}$ ，$b_{k}=\lambda(i,j)$。

For example, the candidate “Here is the information: The Place Phone” contains a KB entity “The Place Phone”
which belongs to restaurant “The Place” and column “Phone”. If “The Place” has been mentioned in the conversation, the bias term for this response should be $v_{Phone}$.

Eq 2式变为：

Combined Model

将两个模型结合起来：

Experiments

_Details in original paper Learning Personalized End-to-End Goal-Oriented Dialog_

Dataset

The personalized bAbI dialog dataset (Joshi, Mi, and Faltings 2017) is a multi-turn dialog corpus extended from the bAbI dialog dataset (Bordes, Boureau, and Weston 2017). It introduces an additional user profile associated with each dialog and updates the utterances and KB entities to integrate personalized style. Five separate tasks in a restaurant reservation scenario are introduced along with the dataset. Here we briefly introduce them for better understanding of our experiments. More details on the dataset can be found in the work by Joshi, Mi, and Faltings (2017).

full, small

Results

Conclusion and Future Work

We introduce a novel end-to-end model for personalization in goal-oriented dialog. Experiment results on open datasets and further analysis show that the model is capable of overcoming some existing issues in dialog systems. The model improves the effectiveness of the bot responses with personalized information, and thus greatly outperforms state-of-the-art methods.

In future work, more representations of personalities apart from the profile attribute can be introduced into goal-oriented dialogs models. Besides, we may explore on learning profile representations for non-domain-specific tasks and consider KB with more complex format such as ontologies.

Helic He

NLP

Learning Personalized End-to-End Goal-Oriented Dialog

Introduction